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Abstract 

It is shown the almost sure convergence and asymptotical normality of a generalization of 
Kesten's stochastic approximation algorithm for multidimensional case. 

In this generalization, the step increases or decreases if the scalar product of two subsequente 
increments of the estimates is positive or negative. 

This rule is intended to accelerate the entrance in the 'stochastic behaviour' when initial con- 
ditions cause the algorithm to behave in a 'deterministic fashion' for the starting iterations. 

1 Introduction and problem statement 

We consider the problem of finding the stationary point x* e R" of a vector field ip : E" — > R™ using 
the stochastic approximation algorithm 

x t = x t -\ --y(s t -i)y t , i = l,2,... (1) 
st = (s t _i+u(-y t T y t _ 1 )) + , t - 2,3, ... (2) 

where 

• Vt = <p{xt~i) + £t, Dt € M n is the t th measure of ip perturbated by the random vector £ t GR"; 

• a + := max{a, 0}; 

• u is a sigmoid function; 

• The random vector x n eM", and the random variables s and si are initial problem conditions 
of the algorithm; 



• x t <E E™ is the t approximation to the stationary point x* e E™ of ip. 

We suppose the following assumptions apply. 
Assumptions Bl 

1- { x o> £i j • • • 7 } are mutually independent random vectors where vectors £j are identically dis- 
tributed with mean zero E£ t = and finite covariance matrix := E£t£^\ We denote Tt the 
a— algebra made by random vectors {xq, £i, £2, ■ ■ ■ , £,t} and random variables sq and s\. Assume 
so, si are mutually independent random variables from {3:0, £1, £2, . . .}. 

2. There exists positive such that for each open ball 7 C B(f2), P(£ t e 7) > 0. 

3. E|ar | < 00. 
Assumptions B2 

1. j(s) is a monotone decreasing function defined in [0, +00) so 7(0) will denote the maximum value 
of the step. 

poo 

2. / "f(s)ds = 00. 
Jo 

/•oo 

3- / "f 2 (s)ds < 00. 
Jo 

Assumptions B3 

1. There exists a continuous function V(x) : E™ — > R + such that 

(a) V(x*) = 0; 

(b) V 2 V{x) < M for each x, M > (the largest eigenvalue of V 2 V{x) is less than M); 

(c) ^(i) t VF(i) > for each a; 7^ a;*; 

(d) For each 7* < 7(0) and for each zo, the sequence 

z t = z t -i - 7*V(*t-i) 

converges deterministically for the stationary point x* and verify that {V{z t ), t = 1,2,...} 
is a monotonous decreasing sequence. 

2. There exists positive 7? and /?o such that 

ip{x) T VV{x) > ^ 7 (0) • ( V {x) T M^{x) +tr(S c M)) + /3 
for |x — a;* I > i?. This condition limits the maximum step 7(0) and guarantees inf^a,. \<p{x)\ > 0. 
Assumptions B4 
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(a) Case of Robbins- (b) Case of Kesten algo- (c) Case similiar to (d) Some generic case. 
Monroe algorithm [8]. rithm [2]. Plakhov- Almeida algo- 

rithm [5]. 

Figure 1: Examples of function it. 

1. u is a monotone, increasing and bounded function R — > K, for which 

u + = lim u(.t) > e u_ = lim u(x). 

x— >+oo a;— — oo 

2. Denote E w = E[u(X^)] where 

X^= inf [-(6 + <^i) T (6 + ^)]. 
ivi l<" 

IV2l<" 

Define Eo := lhm J _ ! .Q+ E w . Constant Eo must be positive. 

Figure [T] shows possible example for function u where cases for known algorithms are included. 

Comment 1 Suppose we are observing the process |Ip 7 starting in to > 1. This new process, 
with initial conditions x to , s to , St„+i and the random sequence £t ,£t +ii ■ ■ ■ a ^ so satisfies conditions. 
Lemma ^ for example, makes use of this comment. 

Comment 2 If u or the distribution of £ t are continuous, then Eo = E[u(— £^£2)] • More, if u is 
continuous and verifies u(x) > — u(— x) when x 7^ 0, then B4-2 is valid for any distribution of £t with 
non zero variance. 

Comment 3 We use the following notation for ip and V : (p 1 denotes a matrix, VV a vector and ~S/ 2 V 
a matrix. 



Theorem 1 Suppose Assumptions Bl to are verified. Then, almost surely, lim xt = x* . 

t— ^00 



Assumptions for asymptotical normality are all assumptions for almost sure convergence and three 
more assumptions: Assumptions B3.3, B3.4 e B4.3. 

Assumption B3.3 All eigenvalues of I — (1/Fi )ip'(x*) are negative, where / is the identity matrix. 
Assumption B3.4 Assume Taylor decomposition for ip, 

\ip{x)-tf{x*){x-x*)\ 



\x — x* I 



0((1), when x -> x* . (3) 
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Comment 4 From this assumption it follows 



sup |</?(x)|/|x — x*\ < oo (4) 

because 

\tp{x) - tp' (x*) (x - x*)\ \ip(x)\ , 

1 + \ ^ 1 ~\ ~ \V ( x ) 

| at — x I \x — x* | 

and so 

< i^vii - Ki)l < ~ 

\x — x* | 

Assumption B4.3 Assume the Taylor decomposition for function u, u(x + Ax) = u(x) + u'(6)Ax 
for between x and x + Ax. 

Theorem 2 Lei xt 6e defined by |Ip and for which almost sure convergence assumptions can be 
verified. Besides, one can also verify Assumptions B3.3, B3.4 e B4-.3. If 'y(s) = 1/s then 

Vt(x t -x*)^-N(0,V) (5) 

where -4 denotes convergence in distribution, and V is a positive definite matrix and unique solution 
of the Lyapunov equation ( see Theorem [3| in Section [^) 

I .A (I 



(l/E y(x*)l(-y) + (-y)l--(l/E )^(x*)) =(1/E )^. (6) 



is 



Comment 5 The explicit solution of equation 

- - / e w - t Se wT - t dt 



(i 



where W = ^ — (l/E )(//(x*), is positive definite. Demonstration of this result can be find, for 
example, in Theorem 12.3.3 in Lancaster e Tismenetsky J3]/. 



2 Proof of almost sure convergence 

Demonstration of the almost sure convergence follows the work for the unidimensional case by Plakhov 
e Cruz (2004) [5] 

Without loss of generality we suppose x* = so (p(x*) = 0. 

Lemma 1 For each e > exists m — m(e) such that, almost surely, it occurs (i) exists t such that 
\x t \ < e, or (ii) exists t such that \x t \ < R and s t < m. (Remember that R is defined in B3.2) 

Proof. Choose e > and define the stopping time 

r = r(e, m) = inf{£ : |x f | < e or (|x t | < R and Sj < m)}. 
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Our aim is to prove that for some m we have P(r = oo) = 0. 
Consider the sequence E t = E[U(a;t) I(i < r)]. 

We introduce the simplified notation V(x t ) = Vt, I(i < r) = I t , W(x t ) — Vt, l{st) — 7t, and 
using that l t < It-i, we obtain 

Et - E t _j = E[Vt/ t - Vt-i h-i] < E[(Vl - Vi-i) It_i]. (7) 

Using Taylor expansion 

F t = V(x t _i - yt-iVt) = V t -i - 7t-iy t T V t _i + i 7t 2 _ iyt T V 2 y f _i(x')y t , 

where x' is a point between xt and Xt-i- Replacing y t for ipt—i + £t and, in agreement with B3.1, one 
obtains 

V t - V t -i < -7t-i^-iV t -i - Tt-i^V*-! + ^Tf-i^tLiM^-i + ^ T M6)- (8) 

Using |7|) and Q and observing that each values 7t-i, (ft-i, It-i is determined by x t -\ and St_i and 
so, mutually independent of £t (Condition Bf .1), 

Et — Ef„i < 

< E^t-x^iVt-i - 7 t -i£t T V t -i + \iti{rf-iM Vt -i + ^Mf t ) It-i] = 
= E[-7t_i^ T _ 1 Vt_i] + E[- 7 t_iCt T Vt_!] + 
E[* 7t -i(^-i*M-i)I*-i] + 

E[~7t-i h-i] ■ E[&M£ t ] 
then using 

. n-^JVt-i] = 0; 

. E[&Mt t ] < tr(^M); 
we have 

m E[E^^7t_! + ^(.^[.jM^i + tr(S £ M))) 7t _ 1 I t _i] . (9) 

If I t _i = f, then (%) |xt| > R, or (zzj |a: t | > e and St > m. In case fjj, using B3.2, one obtains 

- ^iVt-i + + tr(^M)) < -/3 . (10) 

In case (ii) is valid that 7t < 7(m) and define <5 e := mi{(p(x) T VV(x), for all |x| > e}. In this context 

-vf-xVt-i + -74-1(^-1^-1 +tr(5 ? M)) < 

< -<5« + ^MC^-iM^-i +tr(5 e M)) := -/?(e,m) (11) 
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We choose m such that ft(e, m) > and denote ft = inf{/?o, /3(e, m)}. So, in both cases, the expression 
between parentesis in right side of ([£]) is less than — ft ■ jt-i It— 1 and so 

E t -E t _! < —ft ■ E[7 t _i It-i]. 

Using that St < so + fu+ and E It = P(t < r) one have 

Et-Et-! <-ftj(s +tu + )-p(t<T); 

by P(j < t) > P(i < t) when j < i and, using induction argument, 

t-i 

Ek<Ei- ftP(t <t)^2 7(so + ju+) . 
j=o 

where i? := E(V(x ) 1(0 < ^)) < oo by Assumption B1.4. 

Function V is positive for x ^ a;*, so E t > 0, and from here it follows 

P(t<r)< . 

^Ej= 7(so+iu+) 

When t — > oo and using X^- 7(so + J u +) = 00 (inferred from Assumption B2.2), one can conclude 
that P{t = oo) = 0. 

□ 

Lemma 2 For each e > and m > exists S positive such that if \xq\ < R and sq < m then 

P(exists t, \xt\ < e) > 5 . 

Proof. We consider function V defined in Assumptions B4. Let 

e = inf{U(x), \x\ > e}, and 
R = sup{U(x), |i| < R} 

then | so | < R y(a; ) < ^ and \^(a;) < e |x| < e. 

We will show that V{x t ) < e for some t. Denote V t :— V{x t ) and considering the decomposition 

y_yVlV 2 V t 

Vt - Vo V V 1 ---Vr i 
First define the deterministic process with constant step p < j(0) 

Zt = Zt-i ~ PP(zt-i), 4 = 1,2,... 

and by Assumption B3.1, exists V(-) such that {U(zt)} converges monotonically to zero. Using Taylor 
expansion 

V{z t ) = V(z t -i-p<p(z t -i)) = 

= V(z t _ 1 )-pi P (z^ 1 ) T VV(z t _ 1 ) + 

+ ^(z t _ 1 ) T V 2 U(^'M^-i) 

= V(z t _!)-pX 

{yizt-xfVVizt-x) - ^(z t _ 1 ) T V 2 U(z')^(2 t -i)) 



for a certain vector z 1 between zt and Zt-i- Define 

U(z,p) := ^ x (ip(z) T VV(z) P - V {z) T V 2 V{z'Mz)) 

where z' is a point between z and z — ptp(z) and, since V{z t ) decreases monotonically, then it is 
necessary that U{-,-) > 0. Define 

U := inf U(z,p) 

«<M<R 

P<T(0) 

where [7 is a positive constant because [/(■,■)> in e < |z| < -R and p < 7(0). 
Now, we consider Taylor expansion using the original process 

V(x t ) = V(x t - 1 -'y(st-i)<p(xt-i)-'y(s t - 1 )£t)) 
= V(x t - 1 -'y(st-i)<p(xt-i))- 

-7(st-i)& T W(^_! ~ 7(^-1)^(^-1)) + ^p%W 2 (x'% 



and defining ( t '■= |£t| we have for the last term 



- 7 ( S( -i)^W(^-i - 7(s*-i)v(^-i)) + °^f=%W(z")6 < 



7(0)C,| W(z t _i - 7 (* t -iM*t-i))| + ^C'M < 

with the following justification 
f . imposing Ct < 1; 

2. given e < |x| < i? then x t -i and y>(x t _i) are vectors from a closed and limited set and j(st-i) < 
7(0), so VV r (x t _ 1 — 7 (s t _i)y>(x t _i)) could be bounded. 

From definition of function £/(•, •), 

V(*t) < K(a; t _i)(l - 7(^-1) ■ L/^, 7(^-1))) + Ct ■ Q 
and using 1/V(x) < 1/e, for e < |x| < R, and that 7(s t _i) > 7(m + (i — 1) ■ u + ), 

= l-7(*t-i)-C/ + CfQ/e< 

I't-i 

< l-7(m + (t-l)«+)-^ + CfCc/e. 

Denoting G t := 1 — 7(m + (t — l)u + ) • {/ we have G t < f . Divergence of the series ^ t j(m + t ■ u + ) 
implies that the productory n!=i ^» S oes to zer0 - Using that G t < \[G~ t < 1 one can choose Ct such 
that 

Gt + (fCz/e<VG~ t <l (12) 

and 
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whenever that e < |xf_i| < R and |£t| < Ct < 1- We choose n such that -RnlLi 1 VGt < e and suppose 
we have |xo| < R, sq < m and < Q when 1 < t < n — 1. Then, for some t 6 {1, . . . , n}, |x f | < e 
with probability superior to 

a :=p(I£i| <&,..., <&,). 

since from Assumption B1.2 P(£t € /) > 0, for any J. 

□ 

From Lemmas [T] and [2] we have for each e > that exists 6 > such that for arbitrary initial 
conditions Xq, Sq, Si 

P(for some t, \xt\ < e) > 6. 
Then, we can choose a positive integer number n — n(xo, so, s\) such that 

P(for some t < n,\x t \ < e) > 5/2 . 

Denote p = supP(for each t, \xt\ > e), being the supremum over all initial conditions xq, sq, si. Fix 
xq, sq, si; then 

P(for each t, \x t \ > e) = 

= P(for each t > n, \xt \ > e for each t < n,\xt \ > e) ■ P(for each t < n,\x t \ > e) < 

< PQ--S/2). (13) 



Taking supremum of the L.S. of (13) over all triple (xq, sq, s\) and denote it by p. Then, we obtain 



the inequality p < p (1 — (5/2) from which p = 0. So, we obtain the following Lemma 
Lemma 3 For each e > 0, almost surely exists t such that \xt \ < e. 



Lemma 4 Choose e > and rj > 0. Then, exists e\ > and (5 > smc/i t/iai if \xq\ < t\ then 

P(/or some t, \xt\ < e and St > tj) > S . 

Proof. Starting by xt = xq — Y)j—y 7i-i?/i and using Taylor expansion, 

t 

V(x t ) = V(x - y^7j_i3/j) < 

i=l 

t t 

< v(x ) + \wv{x )\ J2 n-i\vi\ cos( yi ,vv(x )) + Ci\ ^m\ 2 ■ 

i=l i=l 

To guarantee the increase in step counter St required by this Lemma we consider two conical 
symmetrical sections where vectors yt will stay and where we impose a maximum and a minimum 
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length for \y t \, yj < \y t \ < yjj, with yi, yjj to be defined. We take xq as a reference point with 
gradient Vo := W(xo). As we will see, we are interested in limiting the internal product 

y T VV(x ) = \y t \ • |Vo| •oosdft.Vo) 

We choose y dd belongs to the conical section on the opposite side of vector Vo and y cvcn to the conical 
section. We choose a value 9 for the internal angle of the cone centrered in vector Vo with 9 belonging 
to (0, 7r/2). In this case cos(y t , Vq) is limited by 



— 1 < cos(y t , Vo) < — cos(0), t odd, 
cos(0) < cos(j/t, Vo) < 1, teven. 



Using ( 14 1 and ( 15 ) we have 



Vu < \yt\ cos(yi, V ) < -yi cos(0), odd case, 
Vi cos(6») < \y t \ cos(y 2 , V ) < yn, even case. 



It is possible to show V(x t ) < e if we prove 



V(x ) < e/3; 



5^7»-ibi||Vo|cos(j/i,Vo) 



< e/3; 



C 1 |^ 7i _ 1 i/ i | a < e/3. 



From (18) we can estimate t\ by Assumption B3.3. 



From ( 20 ) we conclude 



i=i 



and from where we can choose yu (by Assumption B2.2 the series is convergent). 
Because yt belongs to symmmetrical conical sections, 

u(-i/?V-i) < u(y| cos(7r - 9)) = u(-yj cos0), * = 1,2, . . . , n - 1 

therefore 

s t >(t- 2)u(-j/| cos 0), i = 3, 4, . . . , n . 
To satisfy s t > r] required by this Lemma's statement, we assume yj > yjj/2, and 

V 



n-2> 



u(-(yfj/4) cos( 



(14) 
(15) 



(16) 
(17) 

(18) 
(19) 

(20) 



(21) 



(22) 



(23) 



obtained from ( 22 ) 
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Developing the L.S. of (19) we have by (16) and (17) 

t 



Vii ^2 H-i +yicos(9) ^2 7»-i < 

i=l i = l 

(odd) (oven) 

t 

< ^2"fi\yi\ l v o| cos(y i5 V ) < 

i=l 

t t 

< -n 7 cos(6>) ^2 7-t-i +2/J/ X! 7 *-i 



(odd) 



i = l 
(evan) 



Odd sum is bigger than even sum if we start at i = 1. So 



5^7i-i|l/i| l v o| cos(y l; Vq) 



< yu X! 7i-i - y/cos(6») 7,_i 



Using (25), Condition (19) is satisfied if 

t t 

7i-i - Z//cos(6>) 7,_i < e/3 

i=l i=l 

(odd) (oven) 

where we can choose yi > yn/2. 

For each iteration t the values of (f(x t ) := y?t, ?//, ya, 9 are known. Let 

(yt-i+6) T Vo 

|y*l-|v | 

and the conditions that define the admissible region for each random vector £ t are 

yi < \<Pt-i +61 < ya 

tt < cos _1 (ti t ) < ir — 9, t odd 

< cos _1 (w t ) < 6, t even. 

We define 8\ as the smallest probability of the regions defined in each iteration t = 1, . 
<5 := 5™. Probability 5j is positive by Assumption B1.3. 



(24) 



(25) 



(26) 



(27) 



, n and define 



□ 



From Lemmas [3] and [4] it follows that for each e > and r\ > the probability that for some t, 
\xt \ < e and St > n be greater than a positive <5, will depend only on e and 77. Repeating the argument 
of Lemma |3] we have 

Lemma 5 For eac/i e > and r\ > 0, almost surely exists t such that \x t \ < e and s t > 77. 



We define the stopping time r(e) = inf{< : \xt\ > e}. 

Lemma 6 For each < 9 < E exists a constant e > and a sequence 7r„ smc/i i/ia/j linin^oo 7r„ = 
and 

P(st > so + t# — n for each t < r(eo)) > 1 — 7r n . 
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Proof. We will show that 

P(exists t < r(eo) such that s t < sq +t0 — n) < n n — » . 
From B4.2 it follows that for some wq positive exists E Wo > 9 where E Wo = E[it(X("°))] and 



We choose eo such that 



X^= inf [-^ 1+lpi ) T ^ 2 + lp2 )]. (28) 



sup |<p(x)| < w 

M<e 



and define the sequence {St} by 

s =s ; 5 t = s t _ 1 +u(X t (wo) ) (29) 

where 

X^ o) = , inf [-(6 + ¥>t-i) r (£t-i + <ft- a )]. (30) 

Comparing (29 1 and (30) with ([2|, for i < r(eo), we obtain 

s 4 < sj. (31) 



From (|29j) it follows that 

S t ~ s = tE U0 + If™ + If d (32) 

where 

t t 
I—- E [u(X^)-E W0 ], I? dd = E [u(^° 3 )-E. ] 



i = l 
(i ovon) 



where l£ vcn and I° dd are sums of independent and identically distributed random variables with mean 
zero and variance linear with t. 

Comment 6 Both variables I|" e " e 1° are asymptotical normal however they are dependent from 
each others. We use the following argument to estimate the probability of their sum: X + Y < a implies 
X < a/2 or Y < a/2 where X and Y are random variables and a a real constant. Then, 

P(X + Y <a) < P(X < a/2) + P(Y < a/2) ~ 2P(X < a/2). 

So, using that Var If ven = t ■ Vj ± , we have 

P( l c t vcn + I° dd < 2a) < 2P( I?™" <a)< 2$( ° ). (33) 

Vtv V 

From the event s t < sq + 16 — n, we know that St < St for i < r(eo). It follows 





Si 


< 


s + tO 


- n <^ 


so + iE W0 + ir en + 


jodd 


< 


s + te 


- n 4^ 


jeven + 


jodd 


< 




-6)-n 



(34) 
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Comment 7 We will use the following argument, where {Xi,i — 1,...} is a sequence of random 
variables, 

T OO 

Y(exists t < t such that X t < a) < ^ P ( X i < a) P ( X * < a ) ' ( 35 ) 



By (33), (34) and (351 it follows 



Tieven , irodd 



i=l 



s t < s Q + 


te-n) 


< 


— t(E Uo — 


9)-n) 


< 




9)-n) 


< 






< 




s/Wl 



~1 V« 



for certain constants K\ > and K 2 > 0. Last series is convergent and so 7r„ — > 0, then 



7T„ := P (exists t such that 



rreven i irodd 



< 



< — n — i(E WQ — 9)) — > when n — > oo. 



□ 



Now, choose 9 and eo as in Lemma [6j and arbitrarily positive values e < eo and n, and define the 
stopping time 

v = v{n, e) = inf{i : \xt \ > e or St < sq — n + 19} 
and choose t\ > such that 

sup V(x) < ^ inf V{x). 

\x\<c x 2 \x\>t 

Lemma 7 Let \xq\ < €%, so 

/•OO 

V(v < oo) < K j 7 2 (s)ds + vr n , 

J sq — n — 1 

where K is a constant depending on e. 
Proof. Using ^ on Lemma [I] 

V t - V t -i < -"ft-ivtiWt-! - 7 t _i^VV r t _ 1 + l/2 7t 2 _ 1 (^_ 1 M^ i _ 1 +gM£ t ) 
and let V t -V < V t + I' t ' where 



I't = 



^7i-i^iVK_i+7 i _ie?'Wi-i 



7 t " = l/2^7ti(^f-iM^_i+efM^ 



12 



Let 5 := (1/2) inf | ^ | > e V{x). For \x t \ > e then V t — Vq > 6, therefore, 



I't + It >V t -V >S, 



implying I' t > 8/2 or I" > 5/2. We wish to estimate V(v < oo). Denote 



P' = P(Il l{v <oo)> 5/2) 
P" = P(I>/ I(y < oo) > 8/2) 



and using Lemma |6j 



v). 



P(f < e) < n n + P' + P". (36) 
Using Markov's inequality (for example, [H p. 59]), I 2 (-) = !(•), and 1 < i/ < oo) < — 1 < 

P' < ^E[ll 2 I> < oo)] = 



5 2 
4 

5* 



E 



i=l 



]T E^.!^! +^)Wi_i 1< I/) 



X7i-i(^_i +ej)v^_! i(j - 1 < i/)]. 

Recall that variables 7,-1, Vi-i, I(i — 1 < v) and £j are mutually independent. We conclude that 
terms with i ^ j are zero. So, 

. oo v— 1 



P ' ^ ^E E W-i(^-i V ^-i) a (^VK-i) 2 K *)] < K'^lU 



where K' is a constant that verifies 



(37) 



(A/5 2 ) • sup (^VK-i) 2 • sup E^fWi-i] 2 < X'. 

|x|<e |x|<e 



Using P(X > 5/2) < 



where if" verifies 



(2/5) sup tpf^Mtpt + E£f < K" 

\x\<e 



(38) 



using E^ T := % 

For t < v, St > sq + td — n, then 7 t < 7(so — n + t6), and 
V-i 



E 



E 



7, 



< 



1 f 

J2l 2 (so-n + i9) < - / 

j_l -/so— n— 1 



7 2 (s)ds. 



(39) 
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Taking K = 0- 1 (K' + K"), from ([36]), (37}, (38} and Q we obtain Lemma 



□ 



Now, choose positive e < eo and choose n and 77 such that 1 — 7r n — K J^_ n _ 1 'Y 2 (s)ds =: S be 
positive. Choose also ei = ei(e) as defined above. In agreement with Lemmas [5] and [7| almost surely 
exists to such that \xt \ < ei, St > 77, and the probability for all t > to, \x t \ < e exceeds S. 

We define the sequence of stopping times t\ = 1, 



Tj+i = inf{r > Tj : |x T | > e, and for some Tj <t < r, |x t | < ei and s t > 77}, 



1,2, 



We have 



from 



P(r i+ i = 00 I Tj < 00) > 5, 



P(t 1+1 < 00) = P(r i+ i < 00 I < 00) P(r, < 00) < (1 - 5) P(r 4 < 00). 

So, P(rj < 00) — > quando j — > 00; implying that almost surely io = sup{i : Tj < cx)} is finite. 

In accordance to Lemma[5j almost surely exists to > Ti such that |art D | < e% and s to > 77; from here 
we conclude that \x t \ < e when t > t . Theorem 1 is proved. □ 
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3 Proof of the asymptotical normality 

The central idea of the proof follows the work of Delyon and Juditsky (1993) [T]. 

Lemma 8 (Delyon e Juditsky [l]) Let (vt) be a random sequence of real numbers such that v t — > 
almost surely when t — > oo. Then exists a deterministic sequence (at) such that 

a t — > and v t jat — > almost surely. (40) 

In what follows o and O have the standard deterministic meaning however many times they repre- 
sent stochastic random variables belonging to Ft o— algebra of events. 

Lemma 9 Let {Zi, i = 1, . . .} be a sequence of non-negative random variables verifying Zi —> almost 
surely, and let be a sequence of iid random variables with finite variances. Possibly, variables 

Zi and & are dependent. Then 

t 

»=i 

almost surely. 

Proof. From Lemma [8] there exists a deterministic sequence {ai} such that zijai — » almost surely. 
Then < Zi(uj)/ai < M(lS) for each elementary event w. Denote Ci : = — M where := E(|£|), so 
E^j = and Var^ < oo. 

Let St — ^2i = i a id- Then St/t — > in probability by Chebychev inequality. Then, by Levy's 
Theorem (for example, [7] p. 211) St/t — > almost surely because {a^} is a sequence of independent 
random variables. (The same result using Kronecker Lemma [7] because Var(ai(i/i) < oo.) 

Then Sj = o(t) almost surely and 

t 

t 

aj -Miei) — M(u) ■ o(t) = o(t) almost surely. 

i=i 

□ 

Recall definition of Eo in Assumption B4.2. 

Lemma 10 Lei so and Si oe random variables which are initial conditions of the process {s*}, defined 
{§). TTien 

7(s t ) = l/s t = =-r(l + °t)> almost surely (41) 
Eot 

where Ot is a random variable defined in J- t and for which lim t _ i . 00 Ot = almost surely. 



t 

i=i 
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(42) 



Proof. Assumption B4.3 permits the decomposition 

^(-Vi-lVi) = u(-(Vi-2 + &-l) T Oi-l + &)) = 
= u(-(</>i_ 2 + 6t-l) T (^-l + CO) = 

= u(-££. x &) + U'(6»i) X (- ¥ 3?L 2¥ J i _ 1 - <^_ 2 & - <pf_!6-l) 

where (9^ is a point between —yJL^i and We also have that function u' is limited and <p(xi) — > 

from where, by Lemma |9j 



So, we have 



i=i 

t 

i=l 

t 

Y,A9i)<Pi-iti-i = o(t). 
t 

s t = so + si+^(u(-y l 7 L 1 y J ;)-u(-^-i6)) 

i=l 

+Eu(-e^)+Eu(-e^) 

even odd 

= s + si + AU t + P t + It- 



(43) 
(44) 
(45) 



By (43), (44) and (451 



AU t = '^2(u(-y l -iyi) - u(-£j_i£i)) = o(t) almost surely. 

Each of the sums P t and 7 t is composed of independent terms of mean E and finite variance. By 
the law of iterated logarithm 



P t +I t = E Q t + o(^loglogi) . 
Using lim^oo so/t — almost surely, also for si, we have 

s< = s + si + E < + to t + epilog log*) = (E + o t )t, 

almost surely. Then 

/ 1 \ 

Si = (E + o t )t = E i 
1 



Of 

E +o t 



E n t 



l + O t 



a 



Demonstration of Theorem 2 We choose x* = 0. From last Section, we have shown the almost 



surely convergence of x t — > and in Lemma 10 we shown the mean beahaviour of s t = Eot( ^ ) 
where o t — > almost surely. 
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By Lemma [8] we can conclude that there exists a sequence (at) of positive non random numbers 
such that 

at — > and \ot\/at — > 0, \xt\/at — > almost surely. (46) 



Comment 8 We provide an explanantion for the above fact. We can make 9t '■= \o t \ + \xt\ and then 
8 t — > almost surely. Then exists at — > 0, deterministicaly, such that 9 t /a t —¥ almost surely. From 
here it follows \o t \/a t — > and \xt\/b t — > almost surely. 

We define the stopping times 

T R = M{t:\ot\>R\<h\}, a R = inf{f : |a: t | > R\a t \} (47) 

for R > and 

v = mm(T R ,cr R ) . (48) 
From Lemma [8] and from (46) we conclude that for each e > we can choose R < oo such that 

P(f = oo) > 1 - e. (49) 



In this way, with a probability so large as we want we have a deterministic bound common to |o t | and 

M- 

Now, consider the similar process to the algorithm in ([I]) but with deterministic step jt — l/(EoO 
applied to the function ip(x) — ax (a is the derivative of if in x*), 

zt = Zt-i - ^— (azt-i +Ct)> z = x - (50) 

Asymptotical properties of this process are known (for example, Nevel'son e Has'minskii [1]). So 

z t £ 1//2 ~ e — > 0, almost surely, for each e > 0, 
E|z 4 | 2 < A" > 

Vtz t AN(0,V). (51) 

where 1/ is the matrix defined in ([6]). 



Based on Lemma 15 in the reference Section, Lemma 13 will show that, assimptotically, \ftxt and 



\ft%t will have the same limiting distribuition, described in (51 ). □ 



Lemma 11 Consider the following recursive formula, where b > 0, oq are real numbers, 

0<a t+ i<(l-^)a t + O((r 1 ), t = l,2,.... (52) 

Then at — > 0. 
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Proof. Consider the recursive sequence, where e is a positive real number, 

< A t +i < (1 - - f )A t + e/t, t = t , t + 1, • . • • 



Then 



or 



< A t+ i < A t - hA \ £ , t = t Q ,t + l,... . 



0<bA t+1 -e<bA t -e-b bAt t \ t = t , t + 1, . . . . 
We write B t = bA t — e and 

B t+1 = B t (l - b/t) 

so B t — > 0, therefore A t — > e/b. 
Lemma's sequence is 

< ot+i < (1 - h t )a t + 0((l)/i, t= 1,2,... . 

for which we choose e > such that o(l) < e if t > t f° r some t - We define 

A t +i = (1- b -)A t + e/t, t = to,to + l,... 

and ^4 to = a to . Now, we show < a t < j4 t using an induction argument. Suppose A t — a t > for 
i > t - For t + 1 

A+i - ot+i = (1 - - ot) + (e - o(l))/t 

verifying that A t+ i — a t +i > using hypothesis. Then < a t < A t . 

With A t — > e/6 and since we can choose a small enough e, we conclude that A t — > and therefore 
at 0. 



□ 



Lemma 12 Let A be a positive definite matrix and symmetrical, a, b, c and d real vectors. Then 

(a + b + c + d) T A(a + b + c + d) < a T Aa + 

+3(b T Ab + c T Ac + d T Ad) + 
+a T Ab + b T Aa + 
+2a T A(c + d) . 



Proof. From 



(a - b) T A{a -b) = a T Aa + b T Ab - a T Ab - b T Aa > 
^ a T Ab + b T Aa < a T Aa + b T Ab 
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we have 



(a + b) T A{a + b) = a 7 Aa + b 7 Ab + a T Ab + b T Aa 
< a T Aa + b T Ab + a T Aa + b T Ab 
= 2{a 7 Aa + b T Ab) . 



In a similar way 



(a + b + c) 7 A(a + b + c) = a T Aa + b T Ab + c T Ac + 

(a T Ab + b T Aa) + (a 7 Ac + c T Aa) 

(b T Ac + c T Ab) 
< a 1 'Aa + b T Ab + c 1 Ac + 

(a T Aa + b T Ab) + (a T Aa + c 7 'Ac) 

(b T Ab + c T Ac) 
= 3(a T Aa + b T Ab + c T Ac) . 



So, 



(a + b + c + d) T A(a + b + c + d) 



= (a+(b + c + d)) T A(a + (b + c + d)) 
= a 1 'Aa + a T A(b + c + d) + 

(b + c + d) T Aa + (b + c + d) T A(b + c + d) 
< a 7 'Aa + 3(b T Ab + c 7 Ac + d 7 'Ad) + 

a 7 Ab + b 7 'Aa + 2a T A(c + d) . 



a 



Lemma 13 Let A t := x t — z t . Then ViA t —> 0. 
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7t = j- = + o t ) where ot is a random variable of Tt which converges 



Proof. From Lemma 
to almost surely. Then, from ([I]), ([2]) with jt — l/s t , 



x t +i = x t - — (1 + o t )(tp(x t ) + Ct+i) 



(53) 



and 



xt+i =x t - -^-7<p{x t ) - J-lCt+i - jTiV{x t ) - -^r-Xt+i 

HiqI HjqT H/ftl rjQl 



From Assumption B3.4, 



<p(x) = (<p(x)-<f/(0)x)+<p'(0)x : 
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so 



Define 



x i+ i =x t - — (p'(Q)x t - ^-76+1 ~ 

- ttt (<H<p(x t ) +<f(x t ) - <p'(0)x t ) • 



y>(x t ) <p(x t ) - tp'(0)x t 
v t ■= o t - 



\x t \ \x t \ 
and for t < v we have \xt\ < i?a t and |o t | < i?a t 

1^(^)1 l^tet) - <p'(o)x t \ 

\v t \ < i?a 4 sup + sup ^ tJ \ w *! < 

< Ra t M + o(l) := c t . (54) 
We note that c t — > where c t is a positive decreasing sequence and 

x t+1 = x t - ^-<p'(0)x t - ^-6+1 - ^ritt+i - ^-. v Mt\ ■ 



Considering the algorithm for z t 



and 



from where 



z t+i = z t —~(<p'(0)zt + = 



x t+1 = x t - ~<p'{0)x t - -^-M+i - Tr-Xt+i - -^-v t \x t \, 
z t +i = z t - ~ E^i^ t + 1 



A f+ i =A t - -^'(0)A t - -±-vt\xt\ - ~£t+i 



We wish to show that \fiA t = \fi(x t — z t ) ^> and for that porpouse we define Vt '■= Aj AA t 
where A is a definite positive matrix to be specified. 

First we show that E[tV t l(t < v)\ -> and by Theorem [5} p.|ij follows \ft(x t - z t ) ^ 0. So, 

Vt+i = Aj +1 AA t+1 = 

= (At-~y(0)At-~vt\xt\-^tt + i) T - 
■A- 

(At ^-y(0)A t ^-vt\x t \ - ^-J t+ i) 



or, after transposition, 



V t+ i = Aj +1 AA t+1 
■A- 



(A t ~*/(0)A* - ~vt\x t \ - 
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To estimate Vt+i we use Lemma 12 to obtain 

V t +i <V t + B t + Ct + D t 
with B t , C t and D t to be specified and Using I(t+1 < v) < I(i < v) we estimate E[(t+l)Vf+i I(t+ 1 < 

»)\ by 

E[(t + l)V m l(t + Ki/)] < B[(t+l)Vd(t<i/)] 

+E[(t + l)B t I(t < i/)] 
+E[(t + l)C t I(t<i/)] 
+E[(i + 1) A I(t < v)] . 

Considering times when t < v we have \x t \ < Ra t and |o t | < Ra t - For B t , considering t <v, 

^ ' \T,J/r\\T a. Jfn\ A i I™ |2„.T ,i„, , J2/-T 



B t = ^(Al^(0) T Av / (0)A t + \x t \ 2 v?Av t + o 2 t g +1 ASi t+1 ) 

< l s ^(K 1 .V t + \v t \ 2 ■ \x t \ 2 ■ \A\ + o 2 \A\\C t+1 \ 2 ) 
3 1 



< j^^(Ki-V t + c 2 - R 2 a 2 ■ \A\ + R 2 a 2 ■ | 2 • \A\) 



3 1 



< -^^(^i-U f +o(l) + (l)-|6 + i| 2 ) 
where K\ is a positive constant such that 

A^V (0) T V(0)A t < ifiA^A t = ffiVt. 

From 



(* + l)B t < (*i ' V t + o(l) + (1) ■ |6+i| 2 ) 



and using 



• 3 ^ 1 ' ) < ^ , for some positive constant K, 



3. 



3(t+l) 1 



B 2 



M 1 ) = o(t- 1 ); 



. E[|e m | 2 ]=tr(5 5 ); 
we have 

Now we expand Ct, 



E[(f + l)B t I(i < u)} = ^V t + oit- 1 ) 



C t = A t T ^ ¥ /(0)A t + ^A t V(0)AA t = 
= ^Af( V(0)/^o + ^'(0) T /^)A t . 
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Aiming and estimate of Ct in a useful way we find a matrix A which verifies Acp' (0) / Eq + ip' (0) T / EqA = 
I + A and we use also I + A > (1 + f3)A for a real positive constant (3. We write, for A = A T ', 

Acp'(0)/E o + cp'(0) T /E A = I + A^ 
<p'{0) T /E o A + Aip'(0)/E o = I + A<=> 
l p'(Of/ Eo A-^ + A<p>(0)/E o -^ = 

(ip'iOf/Eo-^A + Aiv'W/Eo- 1 -) = I 
and for use Lyapunov's result (Theorem [3]) we write the last equality as 

<l p'(0) T /E )A + A(l - p'(0)/E ) = I 

where, from Assumption B3.3, | — ip'(0)/Eo is negative definite, therefore solution A exists and is 
positive definite. Finalizing, 

-1 



C t = — AKAtp'M/Eo + ip'ioy/EoA)^ 



,vTl A.Jfr\\ /ir i ,J/n\T . 

^Aj(A + I)A t 



We estimate the last term D t 

■1 



D t = p-r(2A£ Av t ■ \x t \ + 2Ai Ao t £ t+1 ) . 
Recall that we are considering t < v and because we can't use |A t | < V t we follow this 

• x t =A t +z t from where \x t \ 2 < \A t \ 2 + \z t \ 2 ; 

• 2|A t | 2 < K 2 V t (2 by convenience) for a certain positive constant K 2 . 
Then, 

2AJ Av t -\x t \ < 2|A t |-|x t |-|A|- Ct 

< (\A t \ 2 + \x t \ 2 )-\A\-ct 

< (2|A t | 2 + |z t | 2 ).|^|-c t 

< (K 2 V t + \z t \ 2 )-\A\-c t 

We considering again the estimation of D t 

D t < —^-(2AjAv t -\x t \+ 2AJ A0&+1) < 

< ^ • \A\ ■ c t - V t + -L • |A| • ct ■ \z t \ 2 - ^AfAoS+i ■ 

FjQl ■£- J 0<' 



Taking 
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• E[|z t | ] = K^/t, for some constant K4; 
Then 



E[(* + i)A] = ^r-'M'«-Vt 

t+1 ... K 4 
< o(l)V t + oir 1 ) 



Now, putting all together, always considering t < v, 
(t + l)V t+1 < {t + l)V t + ^-V t + 

0(^-^(1+^ 

o^Vt + oir 1 ) < 



It follows that, 



and by Lemma |l2"| 



then, by Theorem [5j 



< Ft(i + 1 ^„ ( i + ^)i±I +o( i ))+or i)< 

< t- V t + f^-(l + /3)^+ o(t- 2 )) + o(r 

< tV t (l-(l + /3)J+o(i- 1 ))+o(<- 1 )< 

< ty f (l-(l + /3 + o(l))^) + o(i- 1 )< 

< ^(l-^^+or 1 ). 



E[(t + l)Vt+i l(t+l< u)] < E[tV t l(t < v)] + oit- 1 ) 

E[tV t I(t < v)] 0, 
pr 



tV t l(t <is)^0. 



or 

Vt(x t - z t ) I(t < v) ^ , 
or even, by definition of convergence in probability, 

Vt?>0 P(\Vi(x t - z t )l(t<v)\ < 17) 
The following events are related by 

vt(a;t - z*) < V Vt(x t - z t ) l(t < v) < 77 
and by P(^/i(x t — z t ) < 77) < P(\/t(£( — z t ) I(t < v) < 77) we have 

\/t(z t -2 t )^0. 
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4 Some standard results 

Theorem 3 (A. M. Lyapunov, 1947 (cited in [5J, Chap. 13.1)) Let U,W G C nx ™ and let W 

be positive definite. 

(a) If U is stable then the equation 

UA + AU* = W 
as a unique solution A negavtive definite. 

(b) If exists a negative definite matrix A satisfying the above equation then A is stable. 

Comment 9 Stable is when all eigenvalues are negative. When all eigenvalues are negative then the 
matrix is negative definite. 

Lemma 14 (Markov Inequality (for example, |9j)) Let Z a r.v. and g : M — s- [0, oo] a non 

decreasing function. Then 

Eg(Z) > E(g(Z); Z > c) > g(c)P(Z > c) 

Theorem 4 (Martingale convergence, |9j, Cap. 12) Let M be a martingale for which M n G 
£ 2 ,Vn. Then M is limitied in L 2 iif 

£)E[(M fc -.M fc _ 1 ) 2 ]<°° 

and when this we have 

M n — > almost surely and in C 2 . 

Theorem 5 ([9j, Chap. 13.7) Let (X n ) be a sequence in C 1 and X G C . Then X n —> X in C 1 , or 

similarly E(\X n — X\) — > ; iif the following conditions are verifyed, 

1. X n — > X in probability; 

2. the sequence (X n ) is uniformly integrable (Ve > 03K : E[|X|; \X\ > K] < e). 

Lemma 15 (Slutsky's Theorem, [7J Sec.8.6) If \X t - Z t \ ^ and X t converges in distribution 
then Z t converges in distribuition for the same limit. 

Theorem 6 (Kolmogorov Law of Iterated Logarithm |9j) Let X 1 ,X 2 ,.. - be random variables 
independent and identically distributed with mean and variance 1. Let S n := X\ + ■ • • + X n . Then, 
almost surely, 

S S 
limsup — > +1, liminf — > —1 . 
V 2n log log n \J2n log log n 
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